Android 10 and 11 Benchmarks and ARM big.LITTLE Architecture Issues
The Whetstone benchmark carries out both single precision floating point and integer calculations, the overall MWIPS rating being mainly dependent on the former. System 2 provided performance gains all round between 1.26 and 2.35 times.
|
System 1 Android 10 ARM/Intel Native Whetstone Benchmark 4A8 22-Jul-2021 21.48 Compiled for 64 bit ARM v8a Test MFLOPS MOPS millisecs Results N1 float 455.57 0.042 -1.124750137 N2 float 548.24 0.245 -1.131330490 N3 if 2378.96 0.044 1.000000000 N4 fixpt 2478.69 0.127 12.000000000 N5 cos 91.66 0.908 0.499109805 N6 float 495.56 1.088 0.999999821 N7 equal 1049.83 0.176 3.000000000 N8 exp 47.34 0.786 0.935364604 MWIPS 2927.45 3.416 Total Elapsed Time 16.5 seconds System 2 Android 11 ARM/Intel Native Whetstone Benchmark 4A8 24-Jul-2021 18.14 Compiled for 64 bit ARM v8a Test MFLOPS MOPS millisecs Results N1 float 1068.97 0.018 -1.124750137 N2 float 884.76 0.152 -1.131330490 N3 if 3008.47 0.034 1.000000000 N4 fixpt 5014.23 0.063 12.000000000 N5 cos 142.47 0.584 0.499109805 N6 float 800.99 0.673 0.999999821 N7 equal 2005.59 0.092 3.000000000 N8 exp 69.70 0.534 0.935364604 MWIPS 4650.43 2.150 Total Elapsed Time 16.2 seconds System 2/1 Comparison N1 float 2.35 1.000000000 N2 float 1.61 1.000000000 N3 if 1.26 1.000000000 N4 fixpt 2.02 1.000000000 N5 cos 1.55 1.000000000 N6 float 1.62 1.000000000 N7 equal 1.91 1.000000000 N8 exp 1.47 1.000000000 MWIPS 1.59 |
The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS).
System 2 was indicated as being 1.69 times faster than System 1. Results are often quoted as DMIPS per MHz, in this case 4.16 and 7.03 respectively.
System 1 Android 10 ARM/Intel Dhrystone 2 Benchmark 4A8 22-Jul-2021 21.45 Compiled for 64 bit ARM v8a Nanoseconds one Dhrystone run 68 Dhrystones per Second 14614554 VAX MIPS rating 8318 System 2 Android 11 ARM/Intel Dhrystone 2 Benchmark 4A8 24-Jul-2021 20.59 Compiled for 64 bit ARM v8a Nanoseconds one Dhrystone run 41 Dhrystones per Second 24688271 VAX MIPS rating 14051 System 2/1 Comparison VAX MIPS rating 1.69 |
The Linpack benchmark speed is measured in MFLOPS, the original for double precision (DP) floating point calculations. The single precision (SP) version, was produced as the early ARM processors did not include SIMD DP instructions. NEON SP SIMD operations were included later. Results for this Linpack benchmark code should not be compared with those from High Performance Linpack (HPL) benchmark.
System 2/1 performance ratios varied between 1.77 and 2.15, the latter via using NEON intrinsic functions.
System 1 Android 10 ARM/Intel DP Linpack Benchmark 4A8 16-Jun-2021 23.59 Compiled for 64 bit ARM v8a System 2/1 Speed 1121.87 MFLOPS Comparison norm. resid 1.7 resid 7.41628980e-14 machep 2.22044605e-16 x[0]-1 -1.49880108e-14 x[n-1]-1 -1.89848137e-14 System 2 Android 11 ARM/Intel DP Linpack Benchmark 4A8 24-Jul-2021 21.00 Compiled for 64 bit ARM v8a Speed 1985.71 MFLOPS 1.77 norm. resid 1.7 1.00000 resid 7.41628980e-14 1.00000 machep 2.22044605e-16 1.00000 x[0]-1 -1.49880108e-14 1.00000 x[n-1]-1 -1.89848137e-14 1.00000 System 1 Android 10 ARM/Intel SP Linpack Benchmark 4A8 17-Jun-2021 00.00 Compiled for 64 bit ARM v8a Speed 1116.97 MFLOPS norm. resid 1.6 resid 3.80277634e-05 machep 1.19209290e-07 x[0]-1 -1.38282776e-05 x[n-1]-1 -7.51018524e-06 System 2 Android 11 ARM/Intel SP Linpack Benchmark 4A8 24-Jul-2021 21.01 Compiled for 64 bit ARM v8a Speed 2144.87 MFLOPS 1.92 norm. resid 1.6 1.00000 resid 3.80277634e-05 1.00000 machep 1.19209290e-07 1.00000 x[0]-1 -1.38282776e-05 1.00000 x[n-1]-1 -7.51018524e-06 1.00000 System 1 Android 10 ARM NEON Linpack Benchmark 4A8 22-Jul-2021 22.17 Compiled for 64 bit ARM v8a Speed 2146.12 MFLOPS norm. resid 1.6 resid 3.80277634e-05 machep 1.19209290e-07 x[0]-1 -1.38282776e-05 x[n-1]-1 -7.51018524e-06 System 2 Android 11 ARM NEON Linpack Benchmark 4A8 24-Jul-2021 21.24 Compiled for 64 bit ARM v8a Speed 4620.54 MFLOPS 2.15 norm. resid 1.6 1.00000 resid 3.80277634e-05 1.00000 machep 1.19209290e-07 1.00000 x[0]-1 -1.38282776e-05 1.00000 x[n-1]-1 -7.51018524e-06 1.00000 |
Below are MFLOPS scores for the 24 kernels, at one data span, and overall ratings of Maximum, Average, Geometric mean, Harmonic mean and Minimum MFLOPS. System 2 improvements for the 24 loops were between 1.54 and 2.38 times and official (Geometric) average 1.87 times..
This was the benchmark used to evaluate relative performance of the first supercomputers at Livermore Laboratory, where the Cray 1 was purchased for $7 Million in 1978. Then, the 24 loops geometric mean speed was 11.9 MFLOPS. The 2021 some $200 System 2 phone was 123 times faster.
Also, the Cray 1 weighed 10,500 pounds and had a 115 kilowatt power supply.
System 1 Android 10 ARM/Intel Livermore Loops Benchmark 4A8 22-Jul-2021 22.42 Compiled for 64 bit ARM v8a MFLOPS for 24 loops Do Span 471 1410.7 899.2 878.3 869.0 494.6 711.3 1655.2 1816.5 1713.6 845.2 495.8 1030.7 274.5 466.7 658.8 776.5 931.6 1261.7 455.5 796.2 947.2 742.1 894.9 374.9 Overall Weighted MFLOPS Do Spans 471, 90, 19 Maximum Average Geomean Harmean Minimum 1816.5 877.3 786.1 699.6 269.2 Results of last two calculations 4.850340602749970e+02 1.300000000000000e+01 Total Elapsed Time 9.1 seconds System 2 Android 11 ARM/Intel Livermore Loops Benchmark 4A8 24-Jul-2021 21.02 Compiled for 64 bit ARM v8a MFLOPS for 24 loops Do Span 471 2577.1 1851.7 1597.1 1633.0 773.7 1402.3 2552.8 2943.6 2725.4 1858.0 962.5 2080.3 513.6 740.7 1355.8 1525.6 1484.4 2586.3 699.1 1891.1 1733.1 1288.2 1517.8 658.1 Overall Weighted MFLOPS Do Spans 471, 90, 19 Maximum Average Geomean Harmean Minimum 2943.6 1620.0 1467.8 1310.5 513.6 Results of last two calculations 4.850340602749970e+02 1.300000000000000e+01 Total Elapsed Time 8.8 seconds System 2/1 Comparison MFLOPS for 24 loops Do Span 471 1.83 2.06 1.82 1.88 1.56 1.97 1.54 1.62 1.59 2.20 1.94 2.02 1.87 1.59 2.06 1.96 1.59 2.05 1.63 2.38 1.83 1.74 1.70 1.76 Maximum Average Geomean Harmean Minimum 1.62 1.85 1.87 1.87 1.91 Results of last two calculations were identical |
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision (DP and SP) floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing DP MB/second by 8 and 16, for the two tests, and SP speeds by 4 and 8.
The System 2/1 comparisons indicate all round gains for System 2, lowest for RAM based data and best from L3/L2 shared caches.
System 1 Android 10 ARM/Intel MemSpeed Benchmark 4A8 19-Jun-2021 10.15 Compiled for 64 bit ARM v8a Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 11458 9732 10965 12118 8065 8177 L1 32 11497 9750 10976 12078 8050 8149 64 11449 9724 10927 12216 8065 8184 128 9730 8731 9266 9712 7230 7280 L2 256 9308 8964 9247 9082 7312 7438 512 9292 8985 9277 9244 7441 7488 1024 8375 8098 8341 8394 6877 6896 4096 6333 6268 6304 6302 6051 6085 RAM 16384 6242 6235 6196 6261 6057 5969 65536 6345 6270 6303 6304 6059 6157 Total Elapsed Time 9.5 seconds Max MFLOPS 1437 2438 System 2 Android 11 ARM/Intel MemSpeed Benchmark 4A8 24-Jul-2021 22.23 Compiled for 64 bit ARM v8a Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 14069 12493 13326 26051 13150 12846 L1 32 14074 12343 13339 26128 13038 12846 64 14057 12178 13329 25900 12540 12700 128 13456 11975 13002 21384 12183 12453 L2 256 13209 11801 12676 20306 12026 12150 512 13118 11803 12616 20400 12001 12151 1024 13221 11859 12751 20865 12074 12232 L3 2 MB 4096 10233 10003 10096 9517 9830 8972 RAM 16384 8208 8429 8312 7815 8001 7543 65536 7912 7935 7918 7442 7665 7400 Total Elapsed Time 11.7 seconds Max MFLOPS 1759 3123 System 2/1 Comparison KBytes Dble Sngl Int Dble Sngl Int Average 16 1.23 1.28 1.22 2.15 1.63 1.57 1.51 32 1.22 1.27 1.22 2.16 1.62 1.58 1.51 64 1.23 1.25 1.22 2.12 1.55 1.55 1.49 128 1.38 1.37 1.40 2.20 1.69 1.71 1.63 256 1.42 1.32 1.37 2.24 1.64 1.63 1.60 512 1.41 1.31 1.36 2.21 1.61 1.62 1.59 1024 1.58 1.46 1.53 2.49 1.76 1.77 1.76 4096 1.62 1.60 1.60 1.51 1.62 1.47 1.57 16384 1.31 1.35 1.34 1.25 1.32 1.26 1.31 65536 1.25 1.27 1.26 1.18 1.27 1.20 1.24 |
This benchmark carries out the same calculations as the MemSpeed Benchmark, except they are all in single precision, for comparison with NEON sections. The latter are carried out using NEON intrinsic functions.
System 2 is indicated as being faster on all tests between 1.11 and 9.24 times, best being using SIMD NEON instructions, where maximum Single Precision MFLOPS was the highest recorded by me so far. Based on experience on Intel processors and 128 bit registers, containing four 32 bit words, maximum possible speed, with 2 GHz clock, could be 8 GFLOPS or 16 GFLOPS with fused (or linked) multiply and add. The 9.69 GFLOPS, shown below, indicates some involvement in fusing. A later example recorded 12.8 GFLOPS using 32 floating point operations per data word read.
System 1 Android 10 ARM NeonSpeed Benchmark 4A8 22-Jul-2021 22.18 Compiled for 64 bit ARM v8a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 9241 14779 10559 13864 12703 13179 L1 32 9283 14825 11015 7090 5829 6142 64 4227 6772 4989 6313 9442 13709 128 8515 9179 9198 8787 10010 10032 L2 256 8772 8545 9332 9425 9399 9408 512 8665 8437 9302 9342 9316 9337 1024 7657 7105 8002 8129 8075 8105 4096 6113 6126 6189 6133 6239 6234 RAM 16384 6084 6123 6167 6099 6159 6158 65536 6361 6225 6386 5940 6416 6468 Total Elapsed Time 9.5 seconds Max MFLOPS 2321 3706 System 2 Android 11 ARM NeonSpeed Benchmark 4A8 24-Jul-2021 22.28 Compiled for 64 bit ARM v8a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 12793 38764 13458 42313 53790 53849 L1 32 12798 38713 13463 42599 53877 53788 64 12784 38353 13452 42230 44110 44309 128 12608 28514 13361 28752 28891 28856 L2 256 12394 27811 13169 27813 27915 27954 512 12459 27508 13224 27615 27969 27972 1024 12457 25436 13155 25224 25289 25075 L3 2 MB 4096 10213 7808 10226 9124 9539 9401 RAM 16384 8161 7470 8209 7302 7569 7575 65536 7850 7269 7782 6699 7245 7208 Total Elapsed Time 10.4 seconds Max MFLOPS 3200 9691 System 2/1 Comparison KBytes Norm Neon Norm Neon Float Int Average 16 1.38 2.62 1.27 3.05 4.23 4.09 2.78 32 1.38 2.61 1.22 6.01 9.24 8.76 4.87 64 3.02 5.66 2.70 6.69 4.67 3.23 4.33 128 1.48 3.11 1.45 3.27 2.89 2.88 2.51 256 1.41 3.25 1.41 2.95 2.97 2.97 2.50 512 1.44 3.26 1.42 2.96 3.00 3.00 2.51 1024 1.63 3.58 1.64 3.10 3.13 3.09 2.70 4096 1.67 1.27 1.65 1.49 1.53 1.51 1.52 16384 1.34 1.22 1.33 1.20 1.23 1.23 1.26 65536 1.23 1.17 1.22 1.13 1.13 1.11 1.17 |
This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum bus speed can be estimated by multiplying the Int16 value by 16. Then, for each half reduction in increments, a near doubling of MB/second could be expected. This is not the case here, between 2 word and 1 word increments, with System 2 being the worst. However, see MP-BusSpeed results, suggesting that access by multiple cores is necessary to obtain maximum memory throughput.
Data from caches can also increase on reducing addressing increments, suggesting burst reading, with System 2 indicating improved performance. Normally, only read all comparisons are calculated. In this case, System 2 is indicated as being slower from L2 to L3 caches, due to those address increment complications. The benchmark probably requires longer running times for greater accuracy.
System 1 Android 10 ARM/Intel BusSpeed Benchmark 4A8 22-Jul-2021 21.51 Compiled for 64 bit ARM v8a Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All L1 16 6064 6524 7171 7538 7557 7891 32 5126 5101 5123 5375 3670 3608 64 1970 2149 2914 3612 4857 4712 128 546 551 1212 2676 4973 7888 L2 256 1009 1010 2178 3450 5325 7909 512 1007 1005 2160 3540 5231 7728 1024 513 574 1694 2154 3976 7124 4096 581 606 1461 2784 4981 7561 RAM 16384 580 614 1430 2793 4917 7557 65536 612 642 1375 2712 4851 7482 Total Elapsed Time 5.0 seconds Max Bus Speed? 642 x 16 = 10272 MB/second System 2 Android 11 ARM/Intel BusSpeed Benchmark 4A8 24-Jul-2021 22.26 Compiled for 64 bit ARM v8a Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 6850 6978 7653 7907 7947 7948 L1 32 7559 7652 7675 7948 7958 7948 64 6229 6271 7799 7932 7950 7943 128 1863 3264 5571 7823 7911 7946 L2 256 1414 2208 4398 7319 7883 7937 512 1006 1839 3650 7219 6000 7056 1024 919 1587 3093 5806 7779 6552 L3 2 MB 4096 645 1059 2165 4078 7285 7867 RAM 16384 583 875 1825 3688 7089 7874 65536 569 852 1750 3585 6938 7881 Total Elapsed Time 5.3 seconds Max Bus Speed? 875 x 16 = 14000 MB/second System 2/1 Comparison KBytes Words Words Words Words Words All Average 16 1.13 1.07 1.07 1.05 1.05 1.01 1.06 32 1.47 1.50 1.50 1.48 2.17 2.20 1.72 64 3.16 2.92 2.68 2.20 1.64 1.69 2.38 128 3.41 5.92 4.60 2.92 1.59 1.01 3.24 256 1.40 2.19 2.02 2.12 1.48 1.00 1.70 512 1.00 1.83 1.69 2.04 1.15 0.91 1.44 1024 1.79 2.76 1.83 2.70 1.96 0.92 1.99 4096 1.11 1.75 1.48 1.46 1.46 1.04 1.38 16384 1.01 1.43 1.28 1.32 1.44 1.04 1.25 65536 0.93 1.33 1.27 1.32 1.43 1.05 1.22 |
RandMem benchmark carries out four tests comprising serial and random address selections using the same program structure, with read and read/write tests, where the data read points to the next address, with no arithmetic calculations. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches.
System 2 was clearly faster on most tests, best cases affected by caching influence. The exceptions were due to the strange behaviour with serial reading at 512 and 1024 KB. The benchmark was repeated twice on System 2, confirming the same problem.
System 1 Android 10 ARM/Intel RandMem Benchmark 4A8 22-Jul-2021 22.19 Compiled for 64 bit ARM v8a MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 8911 7308 8913 7281 L1 32 8939 7321 8941 7310 64 8925 7308 8904 7305 128 9024 7305 3297 3359 L2 256 9180 7313 2282 2408 512 9222 7170 1946 2087 1024 6969 6039 509 679 4096 8450 4821 165 194 RAM 16384 8463 4850 138 160 65536 8453 4865 133 156 Total Elapsed Time 8.9 seconds System 2/1 Comparison ARM/Intel RandMem Benchmark 4A8 24-Jul-2021 22.32 Compiled for 64 bit ARM v8a MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 14412 15286 14038 13428 L1 32 14464 15290 14042 13427 64 14544 15199 13983 13342 128 12595 13193 8580 7691 L2 256 12522 13113 4874 4973 512 9747 11936 1962 2467 1024 9016 12918 1230 1566 L3 2 MB 4096 12135 6933 527 530 RAM 16384 11805 5987 408 411 65536 12029 5626 386 382 Total Elapsed Time 8.4 seconds System 2/1 Comparison Memory Serial ....... Random ....... Average KBytes Read Rd/Wrt Read Rd/Wrt 16 1.62 2.09 1.58 1.84 1.78 32 1.62 2.09 1.57 1.84 1.78 64 1.63 2.08 1.57 1.83 1.78 128 1.40 1.81 2.60 2.29 2.02 256 1.36 1.79 2.14 2.07 1.84 512 1.06 1.66 1.01 1.18 1.23 1024 1.29 2.14 2.42 2.31 2.04 4096 1.44 1.44 3.19 2.73 2.20 16384 1.39 1.23 2.96 2.57 2.04 65536 1.42 1.16 2.90 2.45 1.98 |
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results provided are running times in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c.
Memory used increases with FFT sizes, up to use from RAM and is often accessed on a skipped sequential basis, leading to burst reading effects, like in RandMem random access tests. Again, there were all round System 2 performance improvements, this time between 1.26 and 2.90 for the original benchmark and 1.22 and 2.43 for the optimised one.
System 1 Android 10 ARM/Intel FFT Benchmark 1 4A8 22-Jul-2021 21.56 Compiled for 64 bit ARM v8a Size milliseconds Average K Single Precision Double Precision SP DP 1 0.049 0.046 0.045 0.041 0.041 0.041 0.047 0.041 2 0.098 0.098 0.097 0.088 0.087 0.087 0.098 0.087 4 0.210 0.229 0.211 0.199 0.197 0.197 0.217 0.198 8 0.472 0.469 0.470 0.577 0.575 0.577 0.470 0.576 16 1.264 1.259 1.262 1.341 1.354 1.341 1.262 1.345 32 2.884 2.856 2.868 3.044 3.016 3.002 2.869 3.021 64 6.333 6.370 6.332 12.711 12.730 12.613 6.345 12.685 128 23.624 23.341 23.459 58.103 80.489 72.876 23.475 70.489 256 135.025 130.145 126.713 177.673 190.005 181.423 130.628 183.034 512 332.983 334.722 340.869 458.987 423.731 439.251 336.191 440.656 1024 826.862 830.598 769.849 1027.110 988.710 981.118 809.103 998.979 1024 Square Check Maximum Noise Average Noise SP 9.999520e-01 3.346482e-06 4.565234e-11 DP 1.000000e+00 1.133294e-23 1.428110e-28 Total Elapsed Time 10.4 seconds System 2 Android 11 and Comparisons ARM/Intel FFT Benchmark 1 4A8 24-Jul-2021 22.30 Compiled for 64 bit ARM v8a Benchmark 1 Size milliseconds Average Compare Sys 2/1 K Single Precision Double Precision SP DP SP DP 1 0.032 0.029 0.028 0.029 0.028 0.028 0.030 0.028 1.57 1.45 2 0.061 0.060 0.060 0.061 0.060 0.060 0.060 0.060 1.62 1.45 4 0.130 0.129 0.129 0.134 0.133 0.133 0.129 0.133 1.68 1.48 8 0.283 0.433 0.331 0.460 0.453 0.454 0.349 0.456 1.35 1.26 16 0.942 0.942 0.939 1.114 1.150 0.903 0.941 1.056 1.34 1.27 32 1.840 1.615 1.599 2.190 2.180 2.090 1.685 2.153 1.70 1.40 64 4.416 4.323 4.290 5.636 5.453 5.562 4.343 5.550 1.46 2.29 128 14.552 11.566 11.381 29.090 26.732 24.610 12.500 26.811 1.88 2.63 256 45.709 44.106 45.251 63.553 64.000 64.008 45.022 63.854 2.90 2.87 512 117.698 117.816 117.043 151.497 153.442 151.532 117.519 152.157 2.86 2.90 1024 301.501 290.818 289.458 347.153 345.639 345.158 293.926 345.983 2.75 2.89 1024 Square Check Maximum Noise Average Noise SP 9.999520e-01 3.346482e-06 4.565234e-11 DP 1.000000e+00 1.133294e-23 1.428110e-28 System 2/1 SP 1.0000000000 1.0000000000 1.0000000000 DP 1.0000000000 1.0000000000 1.0000000000 Total Elapsed Time 4.1 seconds |
System 1 Android 10 ARM/Intel FFT Benchmark 3c 4A8 22-Jul-2021 21.57 Compiled for 64 bit ARM v8a Size milliseconds Average K Single Precision Double Precision SP DP 1 0.065 0.054 0.054 0.056 0.050 0.050 0.058 0.052 2 0.120 0.114 0.114 0.105 0.106 0.106 0.116 0.106 4 0.256 0.244 0.245 0.234 0.236 0.235 0.248 0.235 8 0.558 0.537 0.537 0.559 0.565 0.561 0.544 0.562 16 1.251 1.224 1.218 1.380 1.284 1.276 1.231 1.313 32 2.676 2.612 2.628 3.406 3.255 3.326 2.639 3.329 64 5.965 5.911 5.929 9.742 9.653 9.911 5.935 9.769 128 15.622 15.281 15.210 24.284 24.313 24.484 15.371 24.360 256 36.663 35.968 35.950 57.401 55.287 52.135 36.194 54.941 512 81.739 100.683 101.756 127.262 127.765 127.512 94.726 127.513 1024 222.161 221.980 218.134 313.828 306.046 304.332 220.758 308.069 1024 Square Check Maximum Noise Average Noise SP 9.999520e-01 3.346482e-06 4.565234e-11 DP 1.000000e+00 1.133294e-23 1.428110e-28 Total Elapsed Time 4.0 seconds System 2 Android 11 and Comparisons ARM/Intel FFT Benchmark 3c 4A8 24-Jul-2021 22.31 Compiled for 64 bit ARM v8a Benchmark 2 Size milliseconds Average Compare Sys 2/1 K Single Precision Double Precision SP DP SP DP 1 0.053 0.044 0.042 0.030 0.029 0.028 0.046 0.029 1.24 1.79 2 0.098 0.090 0.090 0.061 0.061 0.060 0.093 0.061 1.25 1.74 4 0.207 0.195 0.198 0.132 0.168 0.152 0.200 0.151 1.24 1.56 8 0.451 0.425 0.458 0.324 0.323 0.323 0.445 0.323 1.22 1.74 16 1.014 0.975 0.973 0.825 0.773 0.770 0.987 0.789 1.25 1.66 32 1.578 1.542 1.485 1.774 1.764 1.735 1.535 1.758 1.72 1.89 64 3.426 3.446 3.324 4.319 4.329 4.383 3.399 4.344 1.75 2.25 128 10.467 8.002 7.957 10.141 9.951 10.000 8.809 10.031 1.74 2.43 256 20.145 19.834 19.305 24.855 26.867 25.645 19.761 25.789 1.83 2.13 512 44.796 43.895 42.502 60.550 60.966 60.162 43.731 60.559 2.17 2.11 1024 98.944 96.987 96.128 142.164 141.496 139.192 97.353 140.951 2.27 2.19 1024 Square Check Maximum Noise Average Noise SP 9.999520e-01 3.346482e-06 4.565234e-11 DP 1.000000e+00 1.133294e-23 1.428110e-28 System 2/1 SP 1.0000000000 1.0000000000 1.0000000000 DP 1.0000000000 1.0000000000 1.0000000000 Total Elapsed Time 2.1 seconds |
For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same shared code, with separate variables.
Before comparing results, it should be noted that the high Fixpt MOPS are impossible to achieve, where the compiler has found that some of the code can be ignored without changing he calculated result. However, the time for this function has little effect on overall MWIPS rating.
With mixed MHz CPU cores and big.LITTLE architectures, it is more difficult to predict performance using multithreaded benchmarks. Using 8 identical cores, performance would normally nearly double using twice as many cores. Then, this applied to System 1, using Cortex A73 and A53, both at 2.0 GHz. This might be because their architectures are similar in executing the simple Whetstone benchmark test functions.
System 2, with its two fast Kryo 480 CPUs and six slower Kryo 460 ones, with clearly less advanced architecture, lead to overall System 2/1 MWIPS comparison reducing from 1.61 at 2 threads, to 1.28 at 4, then 1.04 at 8.
Samples of my MHz monitor results are provided below, whilst running the benchmark on System 2 (slightly higher than initial specification obtained). These appear to show that appropriate frequencies were used in all cases.
System 1 Android 10 ARM/Intel MP-Whetstone Benchmark 4A8 22-Jul-2021 22.16 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 2844.5 572.2 549.8 488.6 90.9 46.4 32202.9 2645.1 504.4 2T 5473.0 1057.2 995.7 959.1 173.5 88.8 77646.5 4953.9 990.5 4T 11006.8 2172.5 2077.1 1937.1 344.9 178.3 117441.2 10128.7 1986.3 8T 20297.4 4233.3 4031.0 3758.6 608.0 334.9 330393.0 22268.2 3515.9 Overall Seconds 4.76 1T, 5.05 2T, 5.06 4T, 6.11 8T All calculations produced consistent numeric results Total Elapsed Time 22.2 seconds System 2 Android 11 ARM/Intel MP-Whetstone Benchmark 4A8 08-Aug-2021 16.43 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 4326.6 1010.2 984.1 781.9 135.0 67.5 19781.8 2975.5 746.6 2T 8782.2 1850.3 2125.6 1603.9 270.4 133.8 103019.0 5978.0 1505.0 4T 13968.6 3189.1 3372.5 2641.2 438.4 233.3 148677.8 10556.0 2473.3 8T 21038.9 4535.4 4984.9 4171.4 525.4 385.8 353966.8 20385.7 3457.6 Overall Seconds 4.57 1T, 4.54 2T, 6.91 4T, 7.86 8T All calculations produced consistent numeric results Total Elapsed Time 24.8 seconds System 2/1 Comparison MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1T 1.47 1.60 1.41 1.55 1.47 1.43 1.45 0.96 1.47 2T 1.61 1.85 2.12 1.67 1.51 1.59 1.21 1.21 1.52 4T 1.28 1.38 1.59 1.40 1.24 1.34 1.08 1.07 1.24 8T 1.04 1.11 1.22 1.13 0.87 1.15 1.07 0.91 0.99 Sample CPU MHz Measurements Core 0 1 2 3 4 5 6 7 Secs 1 1709 1478 1805 1805 1478 1709 2035 2035 1 Core 2 1805 1709 1805 1805 1478 1805 2035 2035 5 1805 1709 1805 1805 1325 1805 2035 2035 2 Cores 6 1805 1805 1805 1478 1805 1709 2035 2035 11 1805 1805 1805 1709 1805 1805 2035 2035 4 Cores 12 1805 1478 1478 1478 1709 1805 2035 2035 16 1805 1805 1805 1805 1805 1805 2035 2035 8 Cores 17 1805 1805 1805 1805 1805 1805 2035 2035 |
This benchmark does not provide reasonable increases in measured performance using multiple cores, probably because many of the variables used are shared by all threads. Results using one thread are only slightly slower than from the single core version, indicating that threading overheads were not excessive.
The lack of improvement using multiple cores probably invalidates comparisons of the two systems.
System 1 Android 10 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 22-Jul-2021 22.13 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads Threads 1 2 4 8 Seconds 0.53 0.94 1.56 6.52 Dhrystones per Second 14969439 16968330 20569253 9817217 VAX MIPS rating 8520 9658 11707 5587 Internal pass count correct all threads Total Elapsed Time 10.0 seconds System 2 Android 11 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 27-Jul-2021 21.04 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads Threads 1 2 4 8 Seconds 0.68 1.75 3.90 14.16 Dhrystones per Second 23379531 18244401 16418508 9040403 VAX MIPS rating 13307 10384 9345 5145 Internal pass count correct all threads Total Elapsed Time 21.2 seconds System 2/1 Comparison Threads 1 2 4 8 VAX MIPS rating 1.56 1.08 0.80 0.92 |
This is a multithreading version of the above. Further details and results can be found in android neon benchmarks.htm 2013. and 2017 Android Report
This benchmark is not generally available with the new 4A8 compilation as overall running time had increased to more than 400 seconds, on a new phone.
The first comparisons provided for each system are for reading all data, demonstrating changes in throughput on doubling the number of CPU cores used. At 49152 KB, RAM and bus throughput can be the limiting factor, and this can be constant on using more cores. At 12.3 KB,all cores should be accessing L1 cache based data, when variations in the speed of different cores can be significant. The latter can also influence tests at 122.9 KB, but with performance gains provided from L1 caches due to the repetitive reading by more cores.
Below are full comparisons of all measurements and an average of each row that is generally representative of all entries. The benchmark uses streamed AND functions, where performance is probably proportional to CPU MHz on the two systems, as demonstrated using one and two threads and L1 cache. Then, at 4 and 8 threads, the slow System 2 cores lead to System 1 being faster.
System 2 was mainly much faster using L2 and L3 caches, but not quite so from RAM to start with, until the slower System 2 cores came into play.
System 1 Android 10 ARM/Intel MP-BusSpd2 Benchmark 4A8 22-Jul-2021 22.12 Compiled for 64 bit ARM v8a MB/Second Reading Data, 1, 2, 4 and 8 Threads RdAll KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll Gain 12.3 1T 6579 6925 7179 7392 7433 7698 2T 10723 12045 12847 13687 13935 12496 1.62 4T 18313 21611 24525 26576 27695 24033 1.92 8T 16805 20127 41695 37245 51275 37240 1.55 122.9 1T 1282 1261 2450 3990 5987 7665 2T 1660 1634 3201 5542 8427 11388 1.49 4T 1901 1972 4028 7803 14406 22338 1.96 8T 2917 3020 6161 12616 25020 32368 1.45 49152 1T 562 573 1347 2646 4712 7303 2T 616 641 1383 2759 5263 9432 1.29 4T 845 992 1387 2816 5609 10698 1.13 8T 947 914 1854 4119 7329 13010 1.22 No Errors Found Total Elapsed Time 55.1 seconds System 2 Android 11 ARM/Intel MP-BusSpd2 Benchmark 4A8 27-Jul-2021 20.49 Compiled for 64 bit ARM v8a MB/Second Reading Data, 1, 2, 4 and 8 Threads RdAll KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll Gain 12.3 1T 7138 7264 7583 7609 7683 7588 2T 8623 12192 13891 14498 15041 14914 1.97 4T 8020 11035 15436 18877 22120 19132 1.28 8T 12476 15114 28710 25108 37940 27187 1.42 122.9 1T 1857 3441 6018 7969 7287 7211 2T 3918 7120 11024 14414 15691 15856 2.20 4T 4740 7401 12315 17656 20651 18955 1.20 8T 4848 8516 15255 25611 37474 33515 1.77 49152 1T 559 792 1757 3208 6009 7219 2T 752 1120 2054 3630 7162 14022 1.94 4T 769 942 1737 3423 7200 14738 1.05 8T 697 905 1771 3668 7318 14452 0.98 No Errors Found Total Elapsed Time 55.0 seconds System 2/1 Comparison KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll Average 12.3 1T 1.08 1.05 1.06 1.03 1.03 0.99 1.04 2T 0.80 1.01 1.08 1.06 1.08 1.19 1.04 4T 0.44 0.51 0.63 0.71 0.80 0.80 0.65 8T 0.74 0.75 0.69 0.67 0.74 0.73 0.72 122.9 1T 1.45 2.73 2.46 2.00 1.22 0.94 1.80 2T 2.36 4.36 3.44 2.60 1.86 1.39 2.67 4T 2.49 3.75 3.06 2.26 1.43 0.85 2.31 8T 1.66 2.82 2.48 2.03 1.50 1.04 1.92 49152 1T 0.99 1.38 1.30 1.21 1.28 0.99 1.19 2T 1.22 1.75 1.49 1.32 1.36 1.49 1.44 4T 0.91 0.95 1.25 1.22 1.28 1.38 1.16 8T 0.74 0.99 0.96 0.89 1.00 1.11 0.95 |
The System 2/1 performance comparisons were between 0.74 and 5.41, with the widest variations on using four or eight threads. System 2 was clearly the winner using one or two threads, with read/write and at 122.9 KB data size. Then System 1 was the best on read only tests at four and eight threads.
System 1 Android 10 ARM/Intel MP-RndMem Benchmark 4A8 22-Jul-2021 22.14 Compiled for 64 bit ARM v8a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 9927 6127 9763 7586 2T 16033 5560 16875 5334 4T 33574 4718 32230 4201 8T 51899 3553 35118 3749 122.9 1T 9119 7515 3479 3644 2T 12483 5231 4191 2194 4T 20634 4189 5287 1392 8T 36374 3333 7513 1645 12288 1T 8168 4727 227 178 2T 9980 3464 403 172 4T 11411 2540 632 108 8T 18753 1693 848 86 No Errors Found Total Elapsed Time 48.7 seconds System 2 Android 11 ARM/Intel MP-RndMem Benchmark 4A8 27-Jul-2021 20.51 Compiled for 64 bit ARM v8a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 14856 15879 14847 13791 2T 28557 15185 27599 14283 4T 29740 15233 29814 13809 8T 43914 14639 33087 8970 122.9 1T 12174 12422 8374 7495 2T 24468 12783 17755 7664 4T 30157 12649 17826 7525 8T 45480 8194 21182 4668 12288 1T 11517 5872 439 432 2T 14210 5893 472 401 4T 16404 5852 505 429 8T 17490 4001 631 395 No Errors Found Total Elapsed Time 46.7 seconds System 2/1 Comparison KB SerRD SerRDWR RndRD RndRDWR 12.3 1T 1.50 2.59 1.52 1.82 2T 1.78 2.73 1.64 2.68 4T 0.89 3.23 0.93 3.29 8T 0.85 4.12 0.94 2.39 122.9 1T 1.34 1.65 2.41 2.06 2T 1.96 2.44 4.24 3.49 4T 1.46 3.02 3.37 5.41 8T 1.25 2.46 2.82 2.84 12288 1T 1.41 1.24 1.93 2.43 2T 1.42 1.70 1.17 2.33 4T 1.44 2.30 0.80 3.97 8T 0.93 2.36 0.74 4.59 |
The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.
Based on Intel SIMD performance, with 128 bit registers and linked (fused) multiply and add, up to eight single precision floating point operations could be expected per clock cycle, or 16 GFLOPS per core at 2 GHz. At least System 2 approached that at 12.2 and 23.5 GFLOPS, at one and two threads, around twice as fast as System 1. This also demonstrates that SIMD instructions were generated by the compiler.
The SIMD implementation also provided a System 2 performance advantage at four and eight threads,
System 1 Android 10 ARM/Intel MP-MFLOPS2 Benchmark 4A8 22-Jul-2021 22.11 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 3912 3669 2315 6414 6367 6361 2T 3120 3397 2273 11170 11385 11459 4T 5301 7184 2430 21738 20485 20129 8T 8789 11002 2416 29145 29936 28507 Results x 100000, 0 indicates ERRORS 1T 40392 76406 99700 35218 66014 99520 2T 40392 76406 99700 35218 66014 99520 4T 40392 76406 99700 35218 66014 99520 8T 40392 76406 99700 35218 66014 99520 Total Elapsed Time 12.4 seconds System 2 Android 11 ARM/Intel MP-MFLOPS2 Benchmark 4A8 27-Jul-2021 20.53 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 6977 8034 2984 12178 12139 12137 2T 10759 10573 2814 23032 23509 23674 4T 11813 11973 2671 26022 26173 25387 8T 15998 14536 2442 34803 35686 34050 Results x 100000, 0 indicates ERRORS 1T 40392 76406 99700 35218 66014 99520 2T 40392 76406 99700 35218 66014 99520 4T 40392 76406 99700 35218 66014 99520 8T 40392 76406 99700 35218 66014 99520 Total Elapsed Time 7.4 seconds System 2/1 Comparison KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1.78 2.19 1.29 1.90 1.91 1.91 2T 3.45 3.11 1.24 2.06 2.06 2.07 4T 2.23 1.67 1.10 1.20 1.28 1.26 8T 1.82 1.32 1.01 1.19 1.19 1.19 Results Comparison 1T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 2T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 4T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 8T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 |
System 1 Android 10 ARM NEON-MFLOPS2-MP Benchmark 4A8 22-Jul-2021 22.17 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 3683 3507 2275 6485 6405 6301 2T 1845 2692 1564 9914 10164 10025 4T 3542 4042 2308 16358 16976 16623 8T 5953 6765 2377 22075 25944 25316 Results x 100000, 12345 indicates ERRORS 1T 44934 86735 99850 36770 79897 99759 2T 44934 86735 99850 36770 79897 99759 4T 44934 86735 99850 36770 79897 99759 8T 44934 86735 99850 36770 79897 99759 Total Elapsed Time 7.1 seconds System 2 Android 11 ARM NEON-MFLOPS2-MP Benchmark 4A8 27-Jul-2021 20.54 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 6721 7708 2944 12811 12452 12007 2T 6530 6343 2495 22843 23026 22570 4T 7311 6678 2449 24362 25438 24994 8T 11900 11942 2386 32721 32459 34292 Results x 100000, 12345 indicates ERRORS 1T 44934 86735 99850 36770 79897 99759 2T 44934 86735 99850 36770 79897 99759 4T 44934 86735 99850 36770 79897 99759 8T 44934 86735 99850 36770 79897 99759 Total Elapsed Time 3.9 seconds System 2/1 Comparison KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1.82 2.20 1.29 1.98 1.94 1.91 2T 3.54 2.36 1.60 2.30 2.27 2.25 4T 2.06 1.65 1.06 1.49 1.50 1.50 8T 2.00 1.77 1.00 1.48 1.25 1.35 Results Comparison 1T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 2T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 4T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 8T 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 |
All tests are shown to be around twice as fast on System 2.
System 1 Android 10 Android Java OpenGL Benchmark 4A8 22-Jul-2021 22.00 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 42.94 46.30 36.80 31.23 18000+ 24.60 25.91 22.64 18.73 36000+ 13.98 14.25 13.30 10.91 Screen Pixels 720 Wide 1339 High Total Elapsed Time 120.6 seconds System 2 Android 11 Android Java OpenGL Benchmark 4A8 25-Jul-2021 08.24 --------- Frames Per Second -------- Triangles WireFrame Shaded Shaded+ Textured 9000+ 89.66 89.76 81.53 67.58 18000+ 56.37 56.14 48.89 39.09 36000+ 29.16 29.39 27.47 21.27 Screen Pixels 720 Wide 1339 High Total Elapsed Time 120.4 seconds System 2/1 Comparison Triangles WireFrame Shaded Shaded+ Textured 9000+ 2.09 1.94 2.22 2.16 18000+ 2.29 2.17 2.16 2.09 36000+ 2.09 2.06 2.07 1.95 |
This all Java benchmark uses small to rather excessive simple objects to measure drawing performance, again via Frames Per Second (FPS). Five 10 second tests draw on a background of continuously changing colour shades.
System 2 performance advantage grew from 1.16 to 2.68 times with increasing drawing complexity.
System 1 Android 10 Android Java Drawing Benchmark 4A822-Jul-2021 21.58 Test Frames FPS Display PNG Bitmap Twice 597 59.66 Plus 2 SweepGradient Circles 527 52.63 Plus 200 Random Small Circles 455 45.45 Plus 320 Long Lines 329 32.82 Plus 4000 Random Small Circles 100 9.90 Screen pixels 720 Wide 1339 High Total Elapsed Time 50.2 seconds System 2 Android 11 Android Java Drawing Benchmark 4A825-Jul-2021 08.28 Test Frames FPS Display PNG Bitmap Twice 695 69.49 Plus 2 SweepGradient Circles 599 59.73 Plus 200 Random Small Circles 772 77.12 Plus 320 Long Lines 714 71.37 Plus 4000 Random Small Circles 266 26.52 Screen pixels 720 Wide 1339 High Total Elapsed Time 50.1 seconds System 2/1 Comparison Display PNG Bitmap Twice 1.16 Plus 2 SweepGradient Circles 1.13 Plus 200 Random Small Circles 1.70 Plus 320 Long Lines 2.17 Plus 4000 Random Small Circles 2.68 |
System 1 Android 10 Android Java Whetstone Benchmark 4A8 22-Jul-2021 21.57 Test MFLOPS MOPS millisecs Results N1 float 360.23 0.053 -1.124750137 N2 float 348.73 0.385 -1.131330490 N3 if 980.11 0.106 1.000000000 N4 fixpt 1415.09 0.223 12.000000000 N5 cos 83.62 0.995 0.499110132 N6 float 191.96 2.810 0.999999821 N7 equal 290.11 0.637 3.000000000 N8 exp 43.51 0.855 0.935364604 MWIPS 1649.10 6.064 Total Elapsed Time 16.1 seconds System 2 Android 11 Android Java Whetstone Benchmark 4A8 25-Jul-2021 08.32 Test MFLOPS MOPS millisecs Results N1 float 609.91 0.031 -1.124750137 N2 float 557.68 0.241 -1.131330490 N3 if 990.43 0.105 1.000000000 N4 fixpt 2817.53 0.112 12.000000000 N5 cos 136.06 0.612 0.499110132 N6 float 271.06 1.990 0.999999821 N7 equal 619.30 0.298 3.000000000 N8 exp 65.67 0.567 0.935364604 MWIPS 2528.33 3.955 Total Elapsed Time 13.6 seconds System 2/1 Comparison N1 float 1.69 1.000000000 N2 float 1.60 1.000000000 N3 if 1.01 1.000000000 N4 fixpt 1.99 1.000000000 N5 cos 1.63 1.000000000 N6 float 1.41 1.000000000 N7 equal 2.13 1.000000000 N8 exp 1.51 1.000000000 MWIPS 1.53 |
System 2 speed was nearly twice as fast as System 1 via the Java route.
System 1 Android 10 Android Java Linpack Benchmark 4A8 22-Jul-2021 22.11 Speed 461.60 MFLOPS norm. resid 1.67 resid 7.41628980e-14 machep 2.22044605e-16 x[0]-1 -1.49880108e-14 x[n-1]-1 -1.89848137e-14 System 2 Android 11 Android Java Linpack Benchmark 4A8 25-Jul-2021 08.33 System 2/1 Comparison Speed 898.10 MFLOPS 1.95 norm. resid 1.67 1.00000 resid 7.41628980e-14 1.00000 machep 2.22044605e-16 1.00000 x[0]-1 -1.49880108e-14 1.00000 x[n-1]-1 -1.89848137e-14 1.00000 System 2 C Version ARM/Intel DP Linpack Benchmark 4A8 24-Jul-2021 21.00 C/Java Compiled for 64 bit ARM v8a Comparison Speed 1985.71 MFLOPS 2.21 norm. resid 1.7 print rounding differs resid 7.41628980e-14 1.00000 machep 2.22044605e-16 1.00000 x[0]-1 -1.49880108e-14 1.00000 x[n-1]-1 -1.89848137e-14 1.00000 |
Test 1 - Write and read three 8 and 16 MB files; Results given in MBytes/second
Test 2 - Write three 8 MB files, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.
The benchmark has two run buttons RunI to test the internal drive and RunS for an SD card. With RunI, the code to use Direct I/O, avoiding caching, no longer works. However, fully cached results can still be useful. RunS worked originally using a default file path that no longer applies.
A More button is provided to allow uncached reading speed measurements by selecting More/Don’t Delete before RunI to keep the large files, then power off and on followed by More/Read Only (plus Don’t Delete if still required) and RunI.
Below full cached and read only results are provided for the two systems, mainly to demonstrate that the programs worked under these versions of Android. There are indications that System 2 was faster in certain areas, but a number of runs would be required to clarify this, including using the same version of Android.
System 1 Android 10 Internal Drive MB 51050 Free 40895 Android DriveSpeed1 Benchmark 4A8 28-Jul-2021 21.04 Internal Drive Data Cached Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 840.5 1049.7 1260.5 2146.2 1950.5 2314.2 16 757.5 838.3 1040.0 1995.7 2186.2 2094.4 Cached 8 793.8 581.6 595.7 1937.8 2184.7 2012.3 Random Write Read From MB 4 8 16 4 8 16 msecs 0.30 0.21 0.22 0.00 0.00 0.00 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 39.32 80.77 101.42 218.89 345.10 318.04 msecs 0.10 0.10 0.16 0.02 0.02 0.05 0.019 Files Deleted Total Elapsed Time 16.4 seconds System 2 Android 11 Internal Drive MB 46183 Free 38203 Android DriveSpeed1 Benchmark 4A8 28-Jul-2021 20.49 Internal Drive Data Cached Compiled for 64 bit ARM v8a MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 1622.4 1756.0 1750.5 2215.9 2564.1 2913.6 16 1683.2 1618.8 1242.2 2260.4 2642.7 2298.9 Cached 8 872.6 1482.7 1640.6 2118.8 2918.7 3021.2 Random Write Read From MB 4 8 16 4 8 16 msecs 0.38 0.45 0.46 0.00 0.00 0.00 200 Files Write Read Delete File KB 4 8 16 4 8 16 secs MB/sec 69.17 99.34 58.97 533.87 762.62 489.21 msecs 0.06 0.08 0.28 0.01 0.01 0.03 0.009 Files Deleted Total Elapsed Time 16.3 seconds System 1 Read Only MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 273.7 278.9 253.6 System 2 Read Only MBytes/Second MB Write1 Write2 Write3 Read1 Read2 Read3 8 0.0 0.0 0.0 136.7 380.5 450.4 CPU Stress Tests next or Go To Start |
There are two main stress test programs that can use multiple threads to exercise (presently) all CPU cores, one using floating point instructions, and the other carryinfg out integer arithmetic. Further detail is covered in the earlier report - Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM and Intel.pdf and with an update in a 2018 publication. The third program monitors MHz of up to 8 cores. Each of the stress test applications has five buttons:
RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The pass count changes according to the number of calculations per word, for the FPU tests.
RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2 cache, using 8 threads. and 32 operations per word in the FPU tests.
False Errors - These can be caused if the run button is tapped again when the tests are running. The main unique symptoms are multiple “End Time” message displays.
SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes.
Info - Test description and details - This is essentially the same as details provided here.
Save - This provides alternative methods to divert the logged output. Currently I select the Google Drive option, allowing me to access the files on my PCs.
Unexpected Faster Speed - Performance depends on whether the data comes from caches or RAM. Then, increasing the number of threads can lead to CPU cores using dedicated smaller and faster caches.
Sumchecks - The programs include sumchecks to show whether the correct arithmetic calculations were produced, as shown for the benchmark results. For integers, each test section uses a different data pattern for all words, checked by the program after manipulation. Floating point numeric results depend on the number of calculations carried out, constant for stress test reported time slots, easily verified manually.
CP_MHz2 measurements are instantaneous at a constant sampling rate, not averages over that time. The program has Set, Run and Save buttons, as above. Default running time is 15 minutes and sampling rate 10 seconds.
Later below are example results of Stress Test Benchmarks, followed by extended Reliability type Tests. Those for stress tests are from logs running default parameters, with 15 minutes running time. Some of the latter include only necessary detail. Examples of full output are as follows.
ARM/Intel MP-Int Stress Test 4A8 25-Aug-2021 20.04.49 Compiled for 64 bit ARM v8a Data Same All Seconds Size Threads MB/sec Sumcheck Threads 8.9 160 KB 8 56504 00000000 Yes 17.7 160 KB 8 55513 00000000 Yes ARM/Intel MP-FPU Stress Test 4A8 25-Aug-2021 19.08.22 Compiled for 64 bit ARM v8a Data Ops/ Nmeric Seconds Size Threads Word MFLOPS Results 8.7 128 KB 8 32 38035 35216 17.1 128 KB 8 32 37603 35216 |
As seen via the CPU-Z utility app, core MHz values are shown to change at extremely rapid rates. Here, CP_MHz2.apk provides samples at a selected number of seconds rate, as representative and not average. Example output:
MHz Measurement Test 4A8 25-Aug-2021 19.08.40 Running time 16 minutes, 30 second samples MHz for Core Secs 0 1 2 3 4 5 6 7 0.00 1478 1478 1709 1478 1478 1190 2035 1402 30.13 1805 1805 1805 1805 1805 1805 2035 2035 |
The usual relative performance attributes are show to apply, with System 2 indicated as much faster, with cache based data, using 1 or 2 treads, then possibly slower at 4 and 8.
System 1 Integer Stress Test Android 10 ARM/Intel MP-Int Stress Test 4A8 22-Jul-2021 22.20.25 Compiled for 64 bit ARM v8a MB/second KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 2.7 1 9709 9290 8314 00000000 Yes 1.8 2 18282 13642 11112 FFFFFFFF Yes 1.3 4 29213 31022 10590 5A5A5A5A Yes 1.2 8 42274 37461 10819 AAAAAAAA Yes 1.2 16 39014 41492 10944 CCCCCCCC Yes 1.0 32 42745 44809 12595 0F0F0F0F Yes End Time 22-Jul-2021 22.20.37 System 2 Integer Stress Test Android 11 ARM/Intel MP-Int Stress Test 4A8 25-Jul-2021 15.43.50 Compiled for 64 bit ARM v8a MB/second KB KB MB Same All Secs Thrds 16 160 16 Sumcheck Tests 1.7 1 15241 14764 12857 00000000 Yes 1.2 2 27887 28069 12937 FFFFFFFF Yes 1.2 4 27059 32994 13011 5A5A5A5A Yes 1.1 8 40754 44941 12292 AAAAAAAA Yes 1.0 16 44902 45542 12959 CCCCCCCC Yes 0.9 32 45368 49046 14093 0F0F0F0F Yes End Time 25-Jul-2021 15.44.01 System 2/1 Comparison KB KB MB Thrds 16 160 16 Sumcheck 1 1.57 1.59 1.55 SAME 2 1.53 2.06 1.16 SAME 4 0.93 1.06 1.23 SAME 8 0.96 1.20 1.14 SAME 16 1.15 1.10 1.18 SAME 32 1.06 1.09 1.12 SAME |
Again, at 12.8 and 128 KB. System 2 was much faster using 1 or 2 threads, but not so at more than 2.
System 1 FPU Stress Test Android 10 ARM/Intel MP-FPU Stress Test 4A8 23-Aug-2021 12.36.52 Compiled for 64 bit ARM v8a MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 0.7 T1 2 2997 2873 2289 40392 76406 99700 0.6 T2 2 6242 4804 2006 40392 76406 99700 0.5 T4 2 6176 8232 2295 40392 76406 99700 0.4 T8 2 10243 9381 2326 40392 76406 99700 1.8 T1 8 4653 4178 3885 54760 85092 99819 1.0 T2 8 9244 7870 6270 54760 85092 99819 0.7 T4 8 13161 13388 9711 54760 85092 99819 0.6 T8 8 19360 18880 9449 54760 85092 99819 5.0 T1 32 6229 6289 6183 35218 66014 99520 2.6 T2 32 11883 11629 12316 35218 66014 99520 1.6 T4 32 19452 17117 24152 35218 66014 99520 1.3 T8 32 25532 21875 27148 35218 66014 99520 End Time 23-Aug-2021 12.39.53 System 2 FPU Stress Test Android 11 ARM/Intel MP-FPU Stress Test 4A8 25-Jul-2021 15.43.14 Compiled for 64 bit ARM v8a MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Secs Thrd Word 12.8 128 12.8 12.8 128 12.8 0.4 T1 2 7485 7951 2864 40392 76406 99700 0.4 T2 2 10662 7114 2458 40392 76406 99700 0.4 T4 2 11787 8245 2335 40392 76406 99700 0.4 T8 2 12922 11234 2349 40392 76406 99700 0.7 T1 8 11687 11698 9744 54760 85092 99819 0.6 T2 8 18046 15611 10570 54760 85092 99819 0.6 T4 8 16452 14787 10212 54760 85092 99819 0.5 T8 8 23830 23385 9278 54760 85092 99819 2.5 T1 32 12156 12491 12408 35218 66014 99520 1.3 T2 32 23189 23429 23292 35218 66014 99520 1.2 T4 32 22673 25410 27226 35218 66014 99520 1.0 T8 32 28894 35044 29383 35218 66014 99520 End Time 25-Jul-2021 15.43.31 System 2/1 Comparison MFLOPS Numeric Results Ops/ KB KB MB KB KB MB Thrd Word 12.8 128 12.8 12.8 128 12.8 T1 2 2.50 2.77 1.25 1.0000 1.0000 1.0000 T2 2 1.71 1.48 1.23 1.0000 1.0000 1.0000 T4 2 1.91 1.00 1.02 1.0000 1.0000 1.0000 T8 2 1.26 1.20 1.01 1.0000 1.0000 1.0000 T1 8 2.51 2.80 2.51 1.0000 1.0000 1.0000 T2 8 1.95 1.98 1.69 1.0000 1.0000 1.0000 T4 8 1.25 1.10 1.05 1.0000 1.0000 1.0000 T8 8 1.23 1.24 0.98 1.0000 1.0000 1.0000 T1 32 1.95 1.99 2.01 1.0000 1.0000 1.0000 T2 32 1.95 2.01 1.89 1.0000 1.0000 1.0000 T4 32 1.17 1.48 1.13 1.0000 1.0000 1.0000 T8 32 1.13 1.60 1.08 1.0000 1.0000 1.0000 |
Phone 1, with the older technology, suffered from around 20% reduction in performance, with thermal throttling identified by these samples, causing about 25% reduction in average core MHz.
Phone 2 appeared to run continuously with all cores at maximum MHz and at effectively constant performance, increasing a 12% advantage to 41% over Phone 1.
System 1 Android 10 MHz for Core Secs MB/sec 0 1 2 3 4 5 6 7 Average Start 50536 30 50280 1989 1989 1989 1989 1989 1989 1989 1989 1989.0 60 51753 1989 1989 1989 1989 1989 1989 1989 1989 1989.0 90 48929 1248 1417 1014 1326 1989 1989 1989 1989 1620.1 120 46032 1989 1989 1989 1989 1846 1846 1846 1846 1917.5 150 47259 1989 1989 1989 1989 1846 1846 1846 1846 1917.5 180 44773 1989 1989 1989 1989 1846 1846 1846 1846 1917.5 210 44937 1014 1989 1131 1989 1417 1417 1417 1417 1473.9 240 43210 1248 1326 1417 1417 1781 1781 1781 1781 1566.5 270 45773 910 1326 1326 1417 1989 1989 1989 1989 1616.9 300 44227 1989 1989 1989 1989 1716 1716 1716 1677 1847.6 330 43423 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 360 44751 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 390 43341 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 420 44706 1989 1989 1989 1989 1508 1417 1417 1417 1714.4 450 43342 1989 1989 1989 1989 1508 1508 1417 1417 1725.8 480 43055 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 510 41329 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 540 41808 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 570 42219 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 600 41529 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 630 42248 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 660 41451 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 690 40210 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 720 40491 1989 1989 1989 1989 1131 1131 1131 1131 1560.0 750 43947 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 780 43625 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 810 41807 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 840 40617 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 870 40879 1924 1924 1924 1924 1248 1248 1248 1248 1586.0 900 40190 910 1625 1846 1989 1417 1417 1417 1417 1504.8 System 2 Android 11 MHz for Core Secs MB/sec 0 1 2 3 4 5 6 7 Average Start 56504 30 56784 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 60 56801 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 90 56836 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 120 57038 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 150 56999 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 180 56313 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 210 56803 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 240 51659 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 270 56591 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 300 55605 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 330 56918 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 360 56549 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 390 57166 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 420 56985 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 450 57127 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 480 56321 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 510 52377 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 540 56553 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 570 56935 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 600 56567 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 630 56971 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 660 56653 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 690 56682 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 720 56555 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 750 56010 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 780 56752 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 810 56862 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 840 56901 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 870 56852 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 900 56828 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 |
Phone 1 suffered from 17% to 18% reduction in measured MFLOPS and core MHz. Again, Phone 2 appeared to run continuously at maximum speed, with performance gains, over Phone 1, of 21% at the start, increasing to 46% after 15 minutes.
System 1 Android 10 MHz for Core Secs MFLOPS 0 1 2 3 4 5 6 7 Average Start 31309 30 27427 1989 1989 1989 1989 1989 1989 1989 1989 1989.0 60 30330 1989 1989 1989 1989 1846 1846 1846 1846 1917.5 90 27311 1989 1989 1989 1989 1846 1846 1846 1846 1917.5 120 26744 1989 1989 1989 1989 1989 1989 1989 1989 1989.0 150 27714 1989 1989 1989 1989 1716 1716 1716 1716 1852.5 180 26317 1989 1989 1989 1989 1248 1248 1248 1248 1618.5 210 26750 1989 1989 1989 1989 1625 1625 1625 1625 1807.0 240 27494 1989 1989 1989 1989 1625 1625 1625 1625 1807.0 270 26435 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 300 24936 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 330 26723 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 360 26770 1989 1989 1989 1989 1508 1508 1508 1508 1748.5 390 26950 1989 1677 1014 1989 1508 1508 1508 1508 1587.6 420 26661 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 450 26232 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 480 26988 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 510 25936 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 540 25953 1248 1417 1248 1248 1417 1417 1417 1417 1353.6 570 25431 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 600 26234 1248 1131 1014 1326 1989 1989 793 1989 1434.9 630 26008 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 660 26146 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 690 26144 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 720 25600 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 750 25470 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 780 25466 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 810 24963 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 840 25516 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 870 25335 1989 1989 1989 1989 1417 1417 1417 1417 1703.0 900 25738 1989 1989 1989 1989 1326 1326 1326 1326 1657.5 System 2 Android 11 MHz for Core Secs MFLOPS 0 1 2 3 4 5 6 7 Average Start 38035 30 37416 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 60 37635 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 90 37849 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 120 37581 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 150 37826 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 180 37793 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 210 37668 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 240 37791 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 270 37894 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 300 37456 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 330 37587 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 360 37568 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 390 37568 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 420 37709 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 450 37619 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 480 37452 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 510 37762 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 540 37935 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 570 37803 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 600 37684 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 630 37890 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 660 37818 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 690 37874 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 720 37569 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 750 37604 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 780 37764 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 810 37675 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 840 37678 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 870 37525 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 900 37652 1805 1805 1805 1805 1805 1805 2035 2035 1862.5 |
Data Same All System Seconds Size Threads MB/sec Sumcheck Threads 2/1 System 1 Android 10 - 2 Threads 10.0 160 KB 2 18837 00000000 Yes 19.7 160 KB 2 18613 00000000 Yes 892.3 160 KB 2 18815 AAAAAAAA Yes 901.9 160 KB 2 18821 AAAAAAAA Yes System 2 Android 11 - 2 Threads 9.9 160 KB 2 30768 00000000 Yes 19.5 160 KB 2 30747 00000000 Yes 1.65 893.6 160 KB 2 30763 AAAAAAAA Yes 903.2 160 KB 2 30758 AAAAAAAA Yes 1.63 System 1 Android 10 - 4 Threads 9.2 160 KB 4 38542 00000000 Yes 18.1 160 KB 4 38418 00000000 Yes 894.0 160 KB 4 31946 AAAAAAAA Yes 904.6 160 KB 4 32553 AAAAAAAA Yes System 2 Android 11 - 4 Threads 9.5 160 KB 4 36361 00000000 Yes 18.7 160 KB 4 36326 00000000 Yes 0.95 891.4 160 KB 4 36342 AAAAAAAA Yes 900.6 160 KB 4 36329 CCCCCCCC Yes 1.12 System 1 Android 10 - 32 Threads 11.0 16 MB 32 12723 00000000 Yes 21.0 16 MB 32 13004 00000000 Yes 896.7 16 MB 32 12233 5A5A5A5A Yes 907.4 16 MB 32 12305 5A5A5A5A Yes System 2 Android 11 - 32 Threads 10.7 16 MB 32 15301 00000000 Yes 20.5 16 MB 32 15851 00000000 Yes 1.22 890.3 16 MB 32 15083 5A5A5A5A Yes 900.7 16 MB 32 14971 5A5A5A5A Yes 1.22 |
Data Ops/ Nmeric System Seconds Size Threads Word MFLOPS Results 2/1 System 1 Android 10 - 2 Threads 8.0 128 KB 2 2 6918 40015 15.8 128 KB 2 2 6942 40015 893.4 128 KB 2 2 6021 40015 901.9 128 KB 2 2 6378 40015 System 2 Android 11 - 2 Threads 5.7 128 KB 2 2 17933 40015 11.4 128 KB 2 2 17939 40015 2.58 899.8 128 KB 2 2 17920 40015 905.7 128 KB 2 2 17203 40015 2.70 System 1 Android 10 - 4 Threads 9.4 128 KB 4 32 25153 35216 18.5 128 KB 4 32 24895 35216 896.5 128 KB 4 32 20789 35216 907.6 128 KB 4 32 20266 35216 System 2 Android 11 - 4 Threads 10.0 128 KB 4 32 27627 35216 19.7 128 KB 4 32 27632 35216 1.11 899.9 128 KB 4 32 27684 35216 911.1 128 KB 4 32 23746 35216 1.17 System 1 Android 10 - 32 Threads 8.8 12.8 MB 32 32 32893 88227 17.9 12.8 MB 32 32 30370 88227 891.0 12.8 MB 32 32 20635 88227 903.9 12.8 MB 32 32 21551 88227 System 2 Android 11 - 32 Threads 8.8 12.8 MB 32 32 37276 86674 17.4 12.8 MB 32 32 37221 86674 1.23 894.1 12.8 MB 32 32 37619 86674 902.6 12.8 MB 32 32 37366 86674 1.73 |